Highest Probability Svm Nearest Neighbor Classifier for Spam Filtering
نویسندگان
چکیده
In this paper we evaluate the performance of the highest probability SVM nearest neighbor classifier, which is a combination of the SVM and k-NN classifiers, on a corpus of email messages. To classify a sample the algorithm performs the following actions: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labelled samples, and uses this model to classify the given sample, then fits a sigmoid approximation of the probabilistic output for the SVM model, and computes the probabilities of the positive and the negative answers; than it selects that of the 2 × N resulting answers which has the highest probability. The experimental evaluation shows, that this algorithm is able to achieve higher accuracy than the pure SVM classifier at least in the case of equal error costs.
منابع مشابه
Evaluation of the Highest Probability SVM Nearest Neighbor Classifier with Variable Relative Error Cost
In this paper we evaluate the performance of the highest probability SVM nearest neighbor (HP-SVM-NN) classifier, which combines the ideas of the SVM and k-NN classifiers, on the task of spam filtering. To classify a sample, the HP-SVM-NN classifier does the following: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labeled samples, uses this model to classify t...
متن کاملInstance-Based Spam Filtering Using SVM Nearest Neighbor Classifier
In this paper we evaluate an instance-based spam filter based on the SVM nearest neighbor (SVM-NN) classifier, which combines the ideas of SVM and k-nearest neighbor. To label a message the classifier first finds k nearest labeled messages, and then an SVM model is trained on these k samples and used to label the unknown sample. Here we present preliminary results of the comparison of SVM-NN wi...
متن کاملE-mail Spam Filtering with Local Svm Classifiers
This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; our solution is based on the estimation of the a-posteriori probability of the ...
متن کاملImproving spam filtering by combining Naive Bayes with simple k-nearest neighbor searches
Using naive Bayes for email classification has become very popular within the last few months. They are quite easy to implement and very efficient. In this paper we want to present empirical results of email classification using a combination of naive Bayes and k-nearest neighbor searches. Using this technique we show that the accuracy of a Bayes filter can be improved slightly for a high numbe...
متن کاملA Novel Method for Detecting Spam Email using KNN Classification with Spearman Correlation as Distance Measure
E-mail is the most prevalent methods for correspondence because of its availability, quick message exchange and low sending cost. Spam mail appears as a serious issue influencing this application today's internet. Spam may contain suspicious URL’s, or may ask for financial information as money exchange information or credit card details. Here comes the scope of filtering spam from legitimate em...
متن کامل